Prospect Loan Analysis by MUKUL KOMMABATHULA

========================================================

Prosper Marketplace is America’s first peer-to-peer lending marketplace, with over $7 billion in funded loans. Borrowers request personal loans on Prosper and investors (individual or institutional) can fund anywhere from $2,000 to $35,000 per loan request. Investors can consider borrowers’ credit scores, ratings, and histories and the category of the loan. Prosper handles the servicing of the loan and collects and distributes borrower payments and interest back to the loan investors.

Prosper verifies borrowers’ identities and select personal data before funding loans and manages all stages of loan servicing. Prosper’s unsecured personal loans are fully amortized over a period of three or five years, with no pre-payment penalties. Prosper generates revenue by collecting a one-time fee on funded loans from borrowers and assessing an annual loan servicing fee to investors.

## 'data.frame':    113937 obs. of  81 variables:
##  $ ListingKey                         : chr  "1021339766868145413AB3B" "10273602499503308B223C1" "0EE9337825851032864889A" "0EF5356002482715299901A" ...
##  $ ListingNumber                      : int  193129 1209647 81716 658116 909464 1074836 750899 768193 1023355 1023355 ...
##  $ ListingCreationDate                : chr  "2007-08-26 19:09:29.263000000" "2014-02-27 08:28:07.900000000" "2007-01-05 15:00:47.090000000" "2012-10-22 11:02:35.010000000" ...
##  $ CreditGrade                        : chr  "C" "" "HR" "" ...
##  $ Term                               : int  36 36 36 36 36 60 36 36 36 36 ...
##  $ LoanStatus                         : chr  "Completed" "Current" "Completed" "Current" ...
##  $ ClosedDate                         : chr  "2009-08-14 00:00:00" "" "2009-12-17 00:00:00" "" ...
##  $ BorrowerAPR                        : num  0.165 0.12 0.283 0.125 0.246 ...
##  $ BorrowerRate                       : num  0.158 0.092 0.275 0.0974 0.2085 ...
##  $ LenderYield                        : num  0.138 0.082 0.24 0.0874 0.1985 ...
##  $ EstimatedEffectiveYield            : num  NA 0.0796 NA 0.0849 0.1832 ...
##  $ EstimatedLoss                      : num  NA 0.0249 NA 0.0249 0.0925 ...
##  $ EstimatedReturn                    : num  NA 0.0547 NA 0.06 0.0907 ...
##  $ ProsperRating..numeric.            : int  NA 6 NA 6 3 5 2 4 7 7 ...
##  $ ProsperRating..Alpha.              : chr  "" "A" "" "A" ...
##  $ ProsperScore                       : num  NA 7 NA 9 4 10 2 4 9 11 ...
##  $ ListingCategory..numeric.          : int  0 2 0 16 2 1 1 2 7 7 ...
##  $ BorrowerState                      : chr  "CO" "CO" "GA" "GA" ...
##  $ Occupation                         : chr  "Other" "Professional" "Other" "Skilled Labor" ...
##  $ EmploymentStatus                   : chr  "Self-employed" "Employed" "Not available" "Employed" ...
##  $ EmploymentStatusDuration           : int  2 44 NA 113 44 82 172 103 269 269 ...
##  $ IsBorrowerHomeowner                : chr  "True" "False" "False" "True" ...
##  $ CurrentlyInGroup                   : chr  "True" "False" "True" "False" ...
##  $ GroupKey                           : chr  "" "" "783C3371218786870A73D20" "" ...
##  $ DateCreditPulled                   : chr  "2007-08-26 18:41:46.780000000" "2014-02-27 08:28:14" "2007-01-02 14:09:10.060000000" "2012-10-22 11:02:32" ...
##  $ CreditScoreRangeLower              : int  640 680 480 800 680 740 680 700 820 820 ...
##  $ CreditScoreRangeUpper              : int  659 699 499 819 699 759 699 719 839 839 ...
##  $ FirstRecordedCreditLine            : chr  "2001-10-11 00:00:00" "1996-03-18 00:00:00" "2002-07-27 00:00:00" "1983-02-28 00:00:00" ...
##  $ CurrentCreditLines                 : int  5 14 NA 5 19 21 10 6 17 17 ...
##  $ OpenCreditLines                    : int  4 14 NA 5 19 17 7 6 16 16 ...
##  $ TotalCreditLinespast7years         : int  12 29 3 29 49 49 20 10 32 32 ...
##  $ OpenRevolvingAccounts              : int  1 13 0 7 6 13 6 5 12 12 ...
##  $ OpenRevolvingMonthlyPayment        : num  24 389 0 115 220 1410 214 101 219 219 ...
##  $ InquiriesLast6Months               : int  3 3 0 0 1 0 0 3 1 1 ...
##  $ TotalInquiries                     : num  3 5 1 1 9 2 0 16 6 6 ...
##  $ CurrentDelinquencies               : int  2 0 1 4 0 0 0 0 0 0 ...
##  $ AmountDelinquent                   : num  472 0 NA 10056 0 ...
##  $ DelinquenciesLast7Years            : int  4 0 0 14 0 0 0 0 0 0 ...
##  $ PublicRecordsLast10Years           : int  0 1 0 0 0 0 0 1 0 0 ...
##  $ PublicRecordsLast12Months          : int  0 0 NA 0 0 0 0 0 0 0 ...
##  $ RevolvingCreditBalance             : num  0 3989 NA 1444 6193 ...
##  $ BankcardUtilization                : num  0 0.21 NA 0.04 0.81 0.39 0.72 0.13 0.11 0.11 ...
##  $ AvailableBankcardCredit            : num  1500 10266 NA 30754 695 ...
##  $ TotalTrades                        : num  11 29 NA 26 39 47 16 10 29 29 ...
##  $ TradesNeverDelinquent..percentage. : num  0.81 1 NA 0.76 0.95 1 0.68 0.8 1 1 ...
##  $ TradesOpenedLast6Months            : num  0 2 NA 0 2 0 0 0 1 1 ...
##  $ DebtToIncomeRatio                  : num  0.17 0.18 0.06 0.15 0.26 0.36 0.27 0.24 0.25 0.25 ...
##  $ IncomeRange                        : chr  "$25,000-49,999" "$50,000-74,999" "Not displayed" "$25,000-49,999" ...
##  $ IncomeVerifiable                   : chr  "True" "True" "True" "True" ...
##  $ StatedMonthlyIncome                : num  3083 6125 2083 2875 9583 ...
##  $ LoanKey                            : chr  "E33A3400205839220442E84" "9E3B37071505919926B1D82" "6954337960046817851BCB2" "A0393664465886295619C51" ...
##  $ TotalProsperLoans                  : int  NA NA NA NA 1 NA NA NA NA NA ...
##  $ TotalProsperPaymentsBilled         : int  NA NA NA NA 11 NA NA NA NA NA ...
##  $ OnTimeProsperPayments              : int  NA NA NA NA 11 NA NA NA NA NA ...
##  $ ProsperPaymentsLessThanOneMonthLate: int  NA NA NA NA 0 NA NA NA NA NA ...
##  $ ProsperPaymentsOneMonthPlusLate    : int  NA NA NA NA 0 NA NA NA NA NA ...
##  $ ProsperPrincipalBorrowed           : num  NA NA NA NA 11000 NA NA NA NA NA ...
##  $ ProsperPrincipalOutstanding        : num  NA NA NA NA 9948 ...
##  $ ScorexChangeAtTimeOfListing        : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ LoanCurrentDaysDelinquent          : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ LoanFirstDefaultedCycleNumber      : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ LoanMonthsSinceOrigination         : int  78 0 86 16 6 3 11 10 3 3 ...
##  $ LoanNumber                         : int  19141 134815 6466 77296 102670 123257 88353 90051 121268 121268 ...
##  $ LoanOriginalAmount                 : int  9425 10000 3001 10000 15000 15000 3000 10000 10000 10000 ...
##  $ LoanOriginationDate                : chr  "2007-09-12 00:00:00" "2014-03-03 00:00:00" "2007-01-17 00:00:00" "2012-11-01 00:00:00" ...
##  $ LoanOriginationQuarter             : chr  "Q3 2007" "Q1 2014" "Q1 2007" "Q4 2012" ...
##  $ MemberKey                          : chr  "1F3E3376408759268057EDA" "1D13370546739025387B2F4" "5F7033715035555618FA612" "9ADE356069835475068C6D2" ...
##  $ MonthlyLoanPayment                 : num  330 319 123 321 564 ...
##  $ LP_CustomerPayments                : num  11396 0 4187 5143 2820 ...
##  $ LP_CustomerPrincipalPayments       : num  9425 0 3001 4091 1563 ...
##  $ LP_InterestandFees                 : num  1971 0 1186 1052 1257 ...
##  $ LP_ServiceFees                     : num  -133.2 0 -24.2 -108 -60.3 ...
##  $ LP_CollectionFees                  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ LP_GrossPrincipalLoss              : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ LP_NetPrincipalLoss                : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ LP_NonPrincipalRecoverypayments    : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ PercentFunded                      : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ Recommendations                    : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ InvestmentFromFriendsCount         : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ InvestmentFromFriendsAmount        : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Investors                          : int  258 1 41 158 20 1 1 1 1 1 ...

Univariate Plots Section

Plotting Loan Term

- 36 seems to be the Term with the highest month that borrower chooses

  • Explore the Loan originating quarter

  • Let’s take a closer look on the yearly basis data

  • For the graph we have seen that there has been a dip in the year 2009
# Looking into the percantage of the borrowers


tblFun <- function(x){
    tbl <- table(Loan_data$LoanOriginationYear)
    res <- cbind(tbl,round(prop.table(tbl)*100,2))
    colnames(res) <- c('Number Of Borrowers','Percentage')
    res
}

do.call(rbind,lapply(tips[0:1],tblFun))
##      Number Of Borrowers Percentage
  • Next we will see what interest rate does the prosper loan offers
summary(Loan_data$BorrowerRate)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.1340  0.1840  0.1928  0.2500  0.4975

plotting BorrowerRate

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

sum(Loan_data$BorrowerRate==0)
## [1] 8
Loan_data$ProsperRating..Alpha. <- ordered(Loan_data$ProsperRating..Alpha., 
                                levels = c("AA","A","B","C","D","E","HR",""))
levels(Loan_data$ProsperRating..Alpha.)
## [1] "AA" "A"  "B"  "C"  "D"  "E"  "HR" ""
table(Loan_data$ProsperRating..Alpha.)
## 
##    AA     A     B     C     D     E    HR       
##  5372 14551 15581 18345 14274  9795  6935 29084

Plotting prosper rating for borrower

ggplot(aes(x = ProsperRating..Alpha.), data = Loan_data) +
  geom_bar(fill = '#369b80',color = '#0542c4')

# Create a new variable to display the full name Instead of a number for listing category

Loan_data$ListingCategory..string <- mapvalues(Loan_data$ListingCategory..numeric.,
                           from = c(0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,
                                    16,17,18,19,20),
                           to = c("Not Available", "Debt Consolidation", 
                                  "Home Improvements", "Business", 
                                  "Personal Loan","Student Use","Auto",
                                  "Other","Baby&Adoption","Boat",
                                  "Cosmetic Procedure","Engagement Ring",
                                  "Green Loans","Household Expenses",
                                  "Large Purchases","Medical/Dental",
                                  "MotorCycle","RV","Taxes","Vacation",
                                  "Wedding Loans"))

# Create a table to explore the number of borrowers in each category 
table(Loan_data$ListingCategory..string)
## 
##               Auto      Baby&Adoption               Boat           Business 
##               2572                199                 85               7189 
## Cosmetic Procedure Debt Consolidation    Engagement Ring        Green Loans 
##                 91              58308                217                 59 
##  Home Improvements Household Expenses    Large Purchases     Medical/Dental 
##               7433               1996                876               1522 
##         MotorCycle      Not Available              Other      Personal Loan 
##                304              16965              10494               2395 
##                 RV        Student Use              Taxes           Vacation 
##                 52                756                885                768 
##      Wedding Loans 
##                771

Plotting Listing Category

Plotting BorrowerState

summary(Loan_data$LoanOriginalAmount)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1000    4000    6500    8337   12000   35000

- The shape of distribution is positively skewed. Minimum loan amount is 1000 and maximum is 35000. Third quartile is 12000. There is a big difference between Q3 and the max amount.

summary(Loan_data$StatedMonthlyIncome)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       0    3200    4667    5608    6825 1750003

. There seems to be an Outlier.

. I will change the x limits to see the graph closely.

# Lets check the number of borrowers with zero income
sum(Loan_data$StatedMonthlyIncome == 0)
## [1] 1394

. Total of 1394 people got loans with zero income. This group holds people with listing creation date after and before 2009. So there is no chance to think that thay are of some interest to lenders. It is interesting to see that all these people come under zero income or not employed. May be they have shown some property to get the loan or they are doing some other kind of job that doesn’t come in the category of monthly income.

table(Loan_data$IncomeRange)
## 
##             $0      $1-24,999      $100,000+ $25,000-49,999 $50,000-74,999 
##            621           7274          17337          32192          31050 
## $75,000-99,999  Not displayed   Not employed 
##          16916           7741            806

Plotting Income range

ggplot(aes(x=IncomeRange), data=Loan_data) +
  geom_bar(fill='#369b80') +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5,hjust = 1))

. Most people with the income range from 25,000-74,999 took loans.

. Let’s look into the debt to income ratio graph.

summary(Loan_data$DebtToIncomeRatio)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   0.000   0.140   0.220   0.276   0.320  10.010    8554

. To get a clear graph we will take the 99 percentile.

# Create a new variable for the 0.50, 0.90, 0.99 percentage to debt to ratio
debt_income_ratio <- subset(Loan_data, !is.na(DebtToIncomeRatio))
quantile(debt_income_ratio$DebtToIncomeRatio, c(0.5, 0.9, 0.99))
##  50%  90%  99% 
## 0.22 0.42 0.86

. Now the graph seems to be much better. Almost 99% of the Debt to income ratio is less than 0.86. This is a good number because people cannot pay all of their income for their loan payments.

. Let’s investigate the number of people which thier debt to income ratio is greater than 1!

# Check number of borrowers with DebtToIncomeRatio > 1
table(Loan_data$DebtToIncomeRatio > 1)
## 
##  FALSE   TRUE 
## 104584    799

. Let’s look into their loans’ status.

Univariate Analysis

Tip: Now that you’ve completed your univariate explorations, it’s time to reflect on and summarize what you’ve found. Use the questions below to help you gather your observations and add your own if you have other thoughts!

What is the structure of your dataset?

What is/are the main feature(s) of interest in your dataset?

What other features in the dataset do you think will help support your

investigation into your feature(s) of interest?

Did you create any new variables from existing variables in the dataset?

Of the features you investigated, were there any unusual distributions?

Did you perform any operations on the data to tidy, adjust, or change the form
of the data? If so, why did you do this?

Bivariate Plots Section

Here, I setup a dataframe that contains variables that are of interest to further analyze.

# Subset a dataframe to Explore some variables
selected_df <- subset(Loan_data, select = c(BorrowerAPR,BorrowerRate, LenderYield,
ProsperRating..numeric.,CreditScoreRangeLower,CreditScoreRangeUpper,
CurrentCreditLines,OpenCreditLines,TotalCreditLinespast7years,
OpenRevolvingAccounts,TotalInquiries,AmountDelinquent,RevolvingCreditBalance,
BankcardUtilization,AvailableBankcardCredit,DebtToIncomeRatio,
LoanMonthsSinceOrigination,LoanOriginalAmount, MonthlyLoanPayment,Investors))

ggcorr(selected_df, hjust=0.95, size = 2.7, label = TRUE, label_size = 3, layout.exp = 3.5, color = 'black')

## Warning: Use of `Loan_data$ProsperRating..Alpha.` is discouraged.
## ℹ Use `ProsperRating..Alpha.` instead.

. As we can see that the borrower rate keeps on increasing as the ProsperRating keeps on decreasing.

. Now We will analyze on what basis prosper rating is given!

Loan_data$EmploymentStatus <- ordered(Loan_data$EmploymentStatus, levels = c("Not employed",
                                      "Other","Self-employed", "Employed",
                                      "Part-time","Retired","Full-time"))

ggplot(aes(x = EmploymentStatus), data = subset(Loan_data, !is.na(Loan_data$ProsperRating..numeric.))) +
  geom_bar(aes(fill = ProsperRating..Alpha.), position = 'fill')

. It seems that employment status plays a role in determining prosper rating. Employed borrowers must have a better proper rating than not employed.

. We will see how income range influence prosper rating.

. It is clear that as income range is more prosper rating is better. That’s because they are comfortable to pay their debts on time.

. We will see how credit score influence prosper rating.

. As the credit score increases the prosper rating also increases

. Now we will see what factors influence credit score.

ggplot(aes(x = factor(CreditScoreRangeLower), y = CurrentCreditLines), data = subset(Loan_data, CreditScoreRangeLower>500)) +
  geom_boxplot()
## Warning: Removed 5797 rows containing non-finite values (`stat_boxplot()`).

# Check the correlation between CreditScoreRangeLower and CurrentCreditLines
with(Loan_data, cor.test(CreditScoreRangeLower,CurrentCreditLines, method = "pearson"))
## 
##  Pearson's product-moment correlation
## 
## data:  CreditScoreRangeLower and CurrentCreditLines
## t = 46.809, df = 106331, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.1361976 0.1479760
## sample estimates:
##       cor 
## 0.1420918

. It show us that the more the credit line the better is the credit score

#Let's look at corr bween CreditScoreRangeLower and TotalInquiries
with(Loan_data, cor.test(CreditScoreRangeLower, TotalInquiries))
## 
##  Pearson's product-moment correlation
## 
## data:  CreditScoreRangeLower and TotalInquiries
## t = -96.631, df = 112776, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.2819071 -0.2711270
## sample estimates:
##        cor 
## -0.2765257

. The lesser the enquiries the better the score

# Check the correlation between BorrowerRate and CreditScoreRangeLower
with(Loan_data, cor.test(BorrowerRate,CreditScoreRangeLower, method = "pearson"))
## 
##  Pearson's product-moment correlation
## 
## data:  BorrowerRate and CreditScoreRangeLower
## t = -175.17, df = 113344, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.4661358 -0.4569730
## sample estimates:
##        cor 
## -0.4615667

. Good interest rates for higher credit score.

. Now we will see how monthly income, term and loan original amount are influenced by different factors!

# Plotting StatedMonthlyIncome by MonthlyLoanPayment
ggplot(aes(x = StatedMonthlyIncome, y = MonthlyLoanPayment), data = Loan_data) +
  geom_point(alpha = 1/10, fill=I("#ea56b1"),color=I("black"),shape=21)+
  geom_smooth(method = "lm", color = 'red') +
  scale_x_continuous(limits = c(0, quantile(Loan_data$StatedMonthlyIncome, 0.95)))
## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 5677 rows containing non-finite values (`stat_smooth()`).
## Warning: Removed 5677 rows containing missing values (`geom_point()`).

# Check the correlation between StatedMonthlyIncome and MonthlyLoanPayment
with(Loan_data, cor.test(StatedMonthlyIncome,MonthlyLoanPayment, method = "pearson"))
## 
##  Pearson's product-moment correlation
## 
## data:  StatedMonthlyIncome and MonthlyLoanPayment
## t = 67.764, df = 113935, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.1912423 0.2024055
## sample estimates:
##       cor 
## 0.1968303

. People who have more income are taking higher loans.

# Check the correlation between StatedMonthlyIncome and LoanOriginalAmount
with(Loan_data, cor.test(StatedMonthlyIncome,LoanOriginalAmount, method = "pearson"))
## 
##  Pearson's product-moment correlation
## 
## data:  StatedMonthlyIncome and LoanOriginalAmount
## t = 69.353, df = 113935, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.1956816 0.2068243
## sample estimates:
##       cor 
## 0.2012595

. The higher the income, the higher the loan amount taken.

# display borrowers' income range
table(Loan_data$IncomeRange)
## 
##             $0      $1-24,999 $25,000-49,999 $50,000-74,999 $75,000-99,999 
##            621           7274          32192          31050          16916 
##      $100,000+ 
##          17337

. But as the income increases, number of people taking loan is decreasing. Is seems right because people with higher income will be self-sufficient and they may be do not need personal loans.

. People are taking higher loan amounts for debt consolidation and baby&adoption.

## List of 1
##  $ axis.text.x:List of 11
##   ..$ family       : NULL
##   ..$ face         : NULL
##   ..$ colour       : NULL
##   ..$ size         : NULL
##   ..$ hjust        : num 1
##   ..$ vjust        : num 0.5
##   ..$ angle        : num 90
##   ..$ lineheight   : NULL
##   ..$ margin       : NULL
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi FALSE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  - attr(*, "class")= chr [1:2] "theme" "gg"
##  - attr(*, "complete")= logi FALSE
##  - attr(*, "validate")= logi TRUE

. Term has influence over borrower rate.

# Check the correlation between LoanOriginalAmount and BorrowerRate
with(Loan_data, cor.test(LoanOriginalAmount,BorrowerRate, method = "pearson"))
## 
##  Pearson's product-moment correlation
## 
## data:  LoanOriginalAmount and BorrowerRate
## t = -117.58, df = 113935, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.3341283 -0.3237719
## sample estimates:
##        cor 
## -0.3289599

. As loan amount increases, interest rates seem to be reasonable.

Bivariate Analysis

Talk about some of the relationships you observed in this part of the

investigation. How did the feature(s) of interest vary with other features in
the dataset?

Borrower rate is determined by prosper rating, credit score, loan original amount, and term. And there is a strong relationship between Borrower rate and credit score with R^2 -0.46. In turn, credit score is influenced by total inquiries, credit lines and monthly loan payments. And Loan original amount is influenced by term, employment status and listing category.

Did you observe any interesting relationships between the other features

(not the main feature(s) of interest)?

  1. Larger loan payment, lesser inquiries, and more credit lines results in better credit score.
  2. People who earn more are likely to take more loan amount. But as the income increases, number of people taking loans is decreasing.
  3. Employment status has influence upon loan amount. Employed seems to have opportunity to apply for higher loan amounts.
  4. Borrowers can get higher loans when they choose to payoff in more years.
  5. Interest rates are reasonable for higher loan amount.
  6. People are taking higher loan amounts for debt consolidation and baby&adoption.

Multivariate Plots Section

. In this section, we will see how main factors are inter related.

. At the same level of prosper rating and credit score, higher the term implies borrowers have chance to apply for higher loan amount.

. We will see whether income influence loan amount. In bivariate analysis, we have seen that loan original amount and stated monthly income are related by R^2 of 0.2.

. Now we will see how they behave when term comes into the picture.

. Borrowers who have good prosper rating have an opportunity to avail lower borrower rates and at the same time, they can take higher loans.

. Even if income earning are low, people have opportunity to take higher loan amounts when they choose to pay off in 5years. It seems reasonable because borrowers will have affordable monthly loan payments and their debt to income ration will be much more less than 1.

. Overall, all kinds of employment statuses can get higher loans but they have to choose higher term. But in the graph, we can definitely see that those who are employed are borrowing much more loan amount than others in each term group.

. We will see graph for loan original amount Vs income range.

. In this case also, borrowers can take higher loans when they are willing to pay in more number of terms and they are earning more.

. In bivariate analysis, we have seen that higher loan original amount have better interest rates and they are related by R^2 of -0.33. But when term comes into picture, interest rates are a little higher.

Multivariate Analysis

Talk about some of the relationships you observed in this part of the

In spite of the different levels of credit score, proper rating, employment status, and monthly income borrowers have opportunity to take higher levels of loan amounts. But they have to choose to payoff in more number of terms.

Were there any interesting or surprising interactions between features?

People who have more income are likely to take higher loan amount. When I further analyzed loan original amount with respect to borrower rate. People can borrower more money but when term comes into picture, interest rates are little higher.


Final Plots and Summary

Plot One

Description One

Borrowers who have good prosper rating have an opportunity to avail lower borrower rates and at the same time, they can take higher loans. People who have lower proper rating cannot take higher loans like $30,000 and they have to pay higher borrower rates even for less loan amounts. This trend seems quite normal because lenders are taking risk of giving loans to people who have bad prosper rating. So, lenders should get some benefit of higher interest rates. It seems similar to the stock market if one takes the risk they might get huge profit or loss.

Plot Two

Description Two

From this Boxplot it is clear that borrowers can take higher loans when they are willing to pay in more number of terms and they are earning more. And prosper is also making sure that even for people who are taking higher loan amounts have debt to income ration less than 1.


Reflection

. The data set had nearly 114,000 loans from Nov 2005 - March 2014. After 2009 number of loans drastically increased. Prosper also changed its business model from 2009 and this might have attracted many borrowers.

. Before lenders used to determine borrower rate and now depending on credit risk prosper will fix interest rates. Many interesting insights can be drawn from this data. Initially, I was very confused by too many variables but as time progressed, I think I got some hang of these variables. It is also surprising to see that the purpose for which people are taking loans for has changed drastically over years.

. I think that a lot can be analyzed using this data like why some people are not able to pay loan on time, what is determining interest rates, what reasons are making people take loans and so on.


Reflection

Tip: Here’s the final step! Reflect on the exploration you performed and the insights you found. What were some of the struggles that you went through? What went well? What was surprising? Make sure you include an insight into future work that could be done with the dataset.

Tip: Don’t forget to remove this, and the other Tip sections before saving your final work and knitting the final report!